Search CORE

20 research outputs found

RITA: a Study on Scaling Up Generative Protein Sequence Models

Author: Hesslow Daniel
Marks Debora
Notin Pascal
Poli Iacopo
Zanichelli Niccoló
Publication venue
Publication date: 14/07/2022
Field of study

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community

arXiv.org e-Print Archive

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Author: Almazrouei Ebtesam
Alobeidli Hamza
Cappelli Alessandro
Cojocaru Ruxandra
Hesslow Daniel
Launay Julien
Malartic Quentin
Pannier Baptiste
Penedo Guilherme
Publication venue
Publication date: 01/06/2023
Field of study

Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers. This curation process is believed to be necessary to produce performant models with broad zero-shot generalization abilities. However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models from the state-of-the-art trained on The Pile. Despite extensive filtering, the high-quality data we extract from the web is still plentiful, and we are able to obtain five trillion tokens from CommonCrawl. We publicly release an extract of 600 billion tokens from our RefinedWeb dataset, and 1.3/7.5B parameters language models trained on it

arXiv.org e-Print Archive

LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

Author: Brossollet Charles
Cappelli Alessandro
Carron Igor
Chaintoutis Charidimos
Chatelain Amélie
Daudet Laurent
Gigan Sylvain
Hesslow Daniel
Krzakala Florent
Launay Julien
Mokaadi Safa
Moreau Fabien
Müller Kilian
Ohana Ruben
Pariente Gustave
Poli Iacopo
Tommasone Giuseppe L.
Publication venue
Publication date: 25/07/2021
Field of study

We introduce LightOn's Optical Processing Unit (OPU), the first photonic AI accelerator chip available on the market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies on a combination of free-space optics with off-the-shelf components, together with a software API allowing a seamless integration within Python-based processing pipelines. We discuss a variety of use cases and hybrid network architectures, with the OPU used in combination of CPU/GPU, and draw a pathway towards "optical advantage".Comment: Proceedings IEEE Hot Chips 33, 202

arXiv.org e-Print Archive

What Language Model to Train if You Have One Million GPU Hours?

Author: Bari M Saiful
Bekman Stas
Beltagy Iz
Biderman Stella
Elsahar Hady
Hesslow Daniel
Launay Julien
Muennighoff Niklas
Phang Jason
Press Ofir
Raffel Colin
Sanh Victor
Saulnier Lucile
Scao Teven Le
Shen Sheng
Sutawika Lintang
Tae Jaesung
Wang Thomas
Yong Zheng Xin
Publication venue
Publication date: 07/11/2022
Field of study

The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .Comment: Findings of EMNLP 202

arXiv.org e-Print Archive

Memory consolidation in the cerebellar cortex

Author: A Gruart
C Hansel
CD Aizenman
CD Kassardjian
CH Yeo
CH Yeo
CH Yeo
Christopher H. Yeo
CI De Zeeuw
D Aksenov
DA Jirenhed
Daniel O. Kellett
DJ Krupa
ES Boyden
Eva Chen-Kubota
F Bengtsson
FA Miles
G Andersson
G Hesslow
H Jorntell
H Jorntell
I Gormezano
Izumi Fukunaga
J Porrill
J Porrill
JA Kleim
JF Medina
JL Raymond
JL Raymond
JL Raymond
JM Delgado-Garcia
JP Welsh
JR Pugh
KM Christian
KS Garcia
M Ito
M Ito
M Ito
M Ito
NF Lepora
P Isope
Paul Dean
PJ Attwell
PJ Attwell
PJ Attwell
PJ Attwell
S Bouret
S Tronel
S Zbarska
SF Cooke
SP Perrett
T Ohyama
T Ohyama
Thomas Burne
V Bracha
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/07/2010
Field of study

Several forms of learning, including classical conditioning of the eyeblink, depend upon the cerebellum. In examining mechanisms of eyeblink conditioning in rabbits, reversible inactivations of the control circuitry have begun to dissociate aspects of cerebellar cortical and nuclear function in memory consolidation. It was previously shown that post-training cerebellar cortical, but not nuclear, inactivations with the GABA(A) agonist muscimol prevented consolidation but these findings left open the question as to how final memory storage was partitioned across cortical and nuclear levels. Memory consolidation might be essentially cortical and directly disturbed by actions of the muscimol, or it might be nuclear, and sensitive to the raised excitability of the nuclear neurons following the loss of cortical inhibition. To resolve this question, we simultaneously inactivated cerebellar cortical lobule HVI and the anterior interpositus nucleus of rabbits during the post-training period, so protecting the nuclei from disinhibitory effects of cortical inactivation. Consolidation was impaired by these simultaneous inactivations. Because direct application of muscimol to the nuclei alone has no impact upon consolidation, we can conclude that post-training, consolidation processes and memory storage for eyeblink conditioning have critical cerebellar cortical components. The findings are consistent with a recent model that suggests the distribution of learning-related plasticity across cortical and nuclear levels is task-dependent. There can be transfer to nuclear or brainstem levels for control of high-frequency responses but learning with lower frequency response components, such as in eyeblink conditioning, remains mainly dependent upon cortical memory storage

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

White Rose Research Online

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Author: :
Abdollahi Arezoo
Abdulmumin Idris
Abrar Nafis
Adelani David Ifeoluwa
Aghagol Arash
Aji Alham Fikri
Ajibade Benjamin
Akiki Christopher
Akinlolu Martha
Al-shaibani Maged S.
Albanie Samuel
Alfassy Amit
Alizadeh Samira
allal Loubna Ben
Almubarak Khalid
Altay Gabriel
Alyafeai Zaid
Ammanamanchi Pawan Sasanka
Amuok Priscilla
An Ran
Antverg Omer
Bach Stephen H.
Bajaj Yash Shailesh
Bamberger Zachary
Bari M Saiful
Barth Fabio
Baruwa Ahmed
Bawden Rachel
Baylor Emi
Bayrak Giyaseddin
Behroozi Bahareh
Beilharz Benjamin
Bekman Stas
Belinkov Yonatan
Belkada Younes
Bello Imane
Beltagy Iz
Ben-David Srulik
Benyamina Hamza
Bers Tali
Bharati Sushil
Bhattacharjee Joydeep
Bhattacharya Indrani
Biderman Stella
Bogdanov Eli
Bommasani Rishi
Bose Shamik
Bourfoune Hatim
Bras Mathilde
Brito Caio
Broad Nicholas Michio
Brody Shaked
Bulchandani Lokesh
Burns Gully
Burynok Mykola
Cahyawijaya Samuel
Callahan Alison
Canalli Rodrigo
Carpuat Marine
Casper Jared
Castagné Roman
Castillo Maria A
Chaffin Antoine
Chandrasekhar Ramya
Chang Jonathan
Chen Kimbo
Cheng Newton
Cheveleva Anastasia
Chhablani Gunjan
Chim Jenny
Chung Hyung Won
Clinciu Miruna
Clive Jordan
Coavoux Maximin
Colombo Pierre
Contractor Danish
Cornette Pierre
Cullan Michael
Dahlberg Nathan
Danchev Valentin
Dash Ishani
Datta Debajyoti
David Davis
de Bykhovetz Madeleine Hahn
de Gibert Ona
de la Rosa Javier
De Toni Francesco
De Wolf Michiel
del Moral Albert Villanova
Deshmukh Shlok S
Dettmers Tim
Dey Manan
Dodge Jesse
Dupont Gérard
Dutra Livia
Eisenberg Renata
Elbadri Maraim
Elkott Nour
Elsahar Hady
Emezue Chris
Espejel Omar
Fahmy Nour
Fan Angela
Faranak Amy
Feizpour Amir
Ferrandis Carlos Muñoz
Fevry Thibault
Forde Jessica Zosa
Fourrier Clémentine
Freidank Moritz
Fries Jason Alan
Frohberg Jörg
Fuhrimann Florian
Fung Pascale
Gallé Matthias
Gandhi Sanchit
Gao Leo
Garda Samuele
Garrette Dan
Gehrmann Sebastian
Gerchick Marissa
Ghaleb Mustafa
Ghauri Muhammed
Gigant Théo
Giorgi John
Gokaslan Aaron
Golde Jonas
Gonzalez-Dios Itziar
Grandury María
HajiHosseini Azadeh
Haller Patrick
Hao Ryan
Harliman Rheza
Hazan Liam
Heinzerling Benjamin
Henderson Peter
Hesslow Daniel
Hevia Anthony
Huang Max
Ilić Suzana
Jain Chirag
Jauhar Mohammad A.
Jernite Yacine
Jiang Mike Tian-Jian
Johnson Isaac
Jones Hessie
Kainuma Tomoya
Kalo Jan-Christoph
Kang Jihyun
Kang Myungsun
Kasai Jungo
Kashyap Abhinav Ramesh
Kasner Zdeněk
Kassner Nora
Kawamura Ken
Khamis Nurulaqilla
Khan Ammar
Kiblawi Sid
Kiela Douwe
Kim Ethan
Kim Najoung
Kim Taewoon
Klamm Christopher
Kromann Rasmus
Kruszewski Germán
Kumar Srishti
Kusa Wojciech
Labrak Yanis
Lacroix Rémi
Laippala Veronika
Lansky David
Laud Tanmay
Launay Julien
Laurençon Hugo
Lavallée Pierre François
Le Thanh
Le Trieu
Lee Wilson Y.
Leong Colin
Lepercq Violette
Levkovizh Efrat
Lhoest Quentin
Li Conglong
Ligozat Anne-Laure
Limisiewicz Tomasz
Liu Lu
Liu Minna
Lo Kyle
Longpre Shayne
Lovering Charles
Luccioni Alexandra Sasha
López Roberto Luis
Manica Matteo
Manjavacas Enrique
Martin Robert
Masoud Maraim
McKenna Michael
McMillan-Major Angelina
Mielke Sabrina J.
Mieskes Margot
Mihaljcic Mina
Mikhailov Vladislav
Miranda-Escalada Antonio
Mirkin Shachar
Mirza Fatima
Mishra Mayank
Mishra Shubhanshu
Mitchell Margaret
Molano Daniel
Mou Chenghao
Muellner Nikolaus
Muennighoff Niklas
Muhammad Shamsuddeen Hassan
Muñoz Manuel Romero
Nagel Sebastian
Narayanan Deepak
Natan Eyal Bar
Nayak Nihal
Neeraj Trishala
Nejadgholi Isar
Nezhurina Marianna
Nguyen Duong A.
Nguyen Huu
Nguyen Olivier
Nguyen Zach
Nikoulina Vassilina
Nikpoor Somaieh
Nitzav Ariel Kreisberg
Novikova Jekaterina
Névéol Aurélie
Ononiwu Frankline
Osei Salomey
Ott Simon
Oyebade Tobi
Ozoani Ezinwanne
Pai Suhas
Pais Shani
Palasciano Alfredo
Pandey Harshit
Passmore Jesse
Patil Suraj
Patry Nicolas
Pavlick Ellie
Periñán Daniel León
Pestana Amanda
Peyrounette Myriam
Phan Long
Phang Jason
Pistilli Giada
Ponferrada Eduardo González
Posada Jose David
Prabhu Vrinda
Press Ofir
Protasov Vitaly
Pruksachatkun Yada
Pyysalo Sampo
Pàmies Marc
Qiu Mike
Radev Dragomir
Raffel Colin
Raja Arun
Rajani Nazneen
Rajbhandari Samyam
Rasley Jeff
Raunak Vikas
Reiter Ehud
Requena Stéphane
Rezanejad Habib
Ribeiro Rui
Rieser Verena
Roberts Adam
Rogers Anna
Roy Sourav
Rozen Jos
Rueda Alice
Rush Alexander M.
Ruwase Olatunji
Ryabinin Max
Sagot Benoît
Salesky Elizabeth
Samagaio Mairon
Samuel Olanrewaju
Samwald Matthias
Sang-aroonsiri Sinee
Sanh Victor
Sanseviero Omar
Santilli Andrea
Santos Ana
Sanz Julio Bonis
Saulnier Lucile
Saxena Bharat
Scao Teven Le
Schick Timo
Schoelkopf Hailey
Schweter Stefan
Scialom Thomas
Sedenko Irina
Seelam Natasha
Seltzer Josh
Serikov Oleg
Sharma Abheesht
Sharma Shanya
Shavrina Tatiana
Shen Sheng
Shinzato Luisa
Shoeybi Mohammad
Shubber Sarmad
Shukla Anima
Si Chenglei
Silberberg Stanislav
Simhi Adi
Singh Amanpreet
Singh Ayush
Singh Mayank
Sivaraman Karthik Rangasai
Smith Shaden
Solaiman Irene
Soroa Aitor
Stiegler Arnaud
Strobelt Hendrik
Su Rosaline
Su Ruisi
Suarez Pedro Ortiz
Subramani Nishant
Subramonian Arjun
Sun Zhiqing
Sutawika Lintang
Szczechla Eliza
Sänger Mario
Tae Jaesung
Takeuchi Maiko
Taktasheva Ekaterina
Talat Zeerak
Tammour Aycha
Tan Edward
Tan Samson
Tan Zhe
Tang Xiangru
Tanguy Ludovic
Tazi Nouamane
Taşar Davut Emre
Teehan Ryan
Thakker Urmish
Thrush Tristan
Tobing Joseph
Tojarieh Hadar
Torrent Tiago Timponi
Tow Jonathan
Tran Hieu
Tunuguntla Deepak
Unldreaj Antigona
Uri Yallow
van der Wal Oskar
van Strien Daniel
Venkatraman Yash
Viguier Sylvain
Villegas Paulo
Voloshina Ekaterina
von Platen Patrick
Von Werra Leandro
Vrabec Helena U.
Vu Minh Chien
Wang Bo
Wang Han
Wang Silas
Wang Thomas
Weber Leon
Webson Albert
Weinberg Michael
Winata Genta Indra
Wolf Thomas
Workshop BigScience
Xie Zhongli
Xu Canwen
Xu Chuxin
Xu Yifan
Xu Yingxin
Xu Yu
Yang Yoyo
Ye Zifan
Yong Zheng-Xin
Yu Dian
Yu Ian
Yun Tian
Yvon François
Zhang Minjia
Zhang Rui
Zhang Ruochen
Zhou Chenxi
Zhu Jian
Zink Sydney
Šaško Mario
Publication venue
Publication date: 10/12/2022
Field of study

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

arXiv.org e-Print Archive

Linear Optical Random Projections Without Holography

Author: Brunner Daniel
Gigan Sylvain
Hesslow Daniel
Müller Kilian
Ohana Ruben
Publication venue: Optical Society of America - OSA Publishing
Publication date: 01/01/2023
Field of study

International audienceWe introduce what we believe to be a novel method to perform linear optical random projections without the need for holography. Our method consists of a computationally trivial combination of multiple intensity measurements to mitigate the information loss usually associated with the absolute-square non-linearity imposed by optical intensity measurements. Both experimental and numerical findings demonstrate that the resulting matrix consists of real-valued, independent, and identically distributed (i.i.d.) Gaussian random entries. Our optical setup is simple and robust, as it does not require interference between two beams. We demonstrate the practical applicability of our method by performing dimensionality reduction on high-dimensional data, a common task inrandomized numerical linear algebra with relevant applications in machine learning

HAL - Université de Franche-Comté

Changes in complex spike activity during classical conditioning

Author: Hesslow Germund
Jirenhed Dan-Anders
Rasmussen Anders
Wetmore Daniel
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

The cerebellar cortex is necessary for adaptively timed conditioned responses (CRs) in eyeblink conditioning. During conditioning, Purkinje cells acquire pause responses or "Purkinje cell CRs" to the conditioned stimuli (CS), resulting in disinhibition of the cerebellar nuclei (CN), allowing them to activate motor nuclei that control eyeblinks. This disinhibition also causes inhibition of the inferior olive (IO), via the nucleo-olivary pathway (N-O). Activation of the IO, which relays the unconditional stimulus (US) to the cortex, elicits characteristic complex spikes in Purkinje cells. Although Purkinje cell activity, as well as stimulation of the CN, is known to influence IO activity, much remains to be learned about the way that learned changes in simple spike firing affects the IO. In the present study, we analyzed changes in simple and complex spike firing, in extracellular Purkinje cell records, from the C3 zone, in decerebrate ferrets undergoing training in a conditioning paradigm. In agreement with the N-O feedback hypothesis, acquisition resulted in a gradual decrease in complex spike activity during the conditioned stimulus, with a delay that is consistent with the long N-O latency. Also supporting the feedback hypothesis, training with a short interstimulus interval (ISI), which does not lead to acquisition of a Purkinje cell CR, did not cause a suppression of complex spike activity. In contrast, observations that extinction did not lead to a recovery in complex spike activity and the irregular patterns of simple and complex spike activity after the conditioned stimulus are less conclusive

Lund University Publications

Frontiers - Publisher Connector

PubMed Central

Bidirectional plasticity of purkinje cells matches temporal features of learning

Author: Hesslow Germund
Jirenhed Dan-Anders
Johansson Fredrik
Rasmussen Anders
Schnitzer Mark J
Wetmore Daniel
Publication venue: 'Society for Neuroscience'
Publication date: 01/01/2014
Field of study

Many forms of learning require temporally ordered stimuli. In Pavlovian eyeblink conditioning, a conditioned stimulus (CS) must precede the unconditioned stimulus (US) by at least about 100 ms for learning to occur. Conditioned responses are learned and generated by the cerebellum. Recordings from the cerebellar cortex during conditioning have revealed CS-triggered pauses in the firing of Purkinje cells that likely drive the conditioned blinks. The predominant view of the learning mechanism in conditioning is that long-term depression (LTD) at parallel fiber (PF)-Purkinje cell synapses underlies the Purkinje cell pauses. This raises a serious conceptual challenge because LTD is most effectively induced at short CS-US intervals, which do not support acquisition of eyeblinks. To resolve this discrepancy, we recorded Purkinje cells during conditioning with short or long CS-US intervals. Decerebrated ferrets trained with CS-US intervals ≥150 ms reliably developed Purkinje cell pauses, but training with an interval of 50 ms unexpectedly induced increases in CS-evoked spiking. This bidirectional modulation of Purkinje cell activity offers a basis for the requirement of a minimum CS-US interval for conditioning, but we argue that it cannot be fully explained by LTD, even when previous in vitro studies of stimulus-timing-dependent LTD are taken into account

Lund University Publications

PubMed Central